feat: end-to-end OpenAI Image Generation support #1280

nutanix-Hrushikesh · 2025-10-06T16:22:08Z

Description
This PR adds complete support for OpenAI’s image generation endpoint (/v1/images/generations) across the Envoy AI Gateway. It introduces a processor, translation layer, tracing and metrics instrumentation, Brotli decoding, example client/service updates, and repo hygiene improvements.

Changes

ExtProc (image generation)
- Added imageGenerationProcessorFactory and registered it in ExtProc main.
- Implemented imageGenerationProcessorRouterFilter and imageGenerationProcessorUpstreamFilter.
- Added request/response header and body processing, including retry handling and auth passthrough.
- Introduced imageGenerationMetrics and integrated with the processor to record image-specific telemetry.
- Improved diagnostics with debug logs for processor selection/instantiation when matching routes.
Translator (OpenAI to OpenAI)
- Added ImageGenerationTranslator interface and ImageGenerationMetadata to standardize request/response translation for images.
- Implemented OpenAI to OpenAI translator:
  - Request body transformation (model overrides, forced mutation).
  - Response headers/body parsing with OpenAI SDK v2 schema to avoid drift.
Tracing
- Extended Tracing API to support image generation with ImageGenerationTracer and ImageGenerationRecorder.
- Implemented imageGenerationTracer and imageGenerationSpan:
  - Span start, header injection, response recording, and well-defined error termination paths.
- Added OpenInference-based ImageGenerationRecorder for router filter instrumentation and a Noop tracer.
Metrics
- Implemented ImageGenerationMetrics with methods to record lifecycle events, model/backend selection, token usage, and image generation stats.
- Extended GenAI metrics with image-specific attributes:
  - genaiAttributeImageCount, genaiAttributeImageModel, genaiAttributeImageSize
- Added operation type: genaiOperationImageGeneration.
Utilities
- decodeContentIfNeeded to support brotli encoding alongside gzip for modern upstreams.
CLI, docs, and examples
- cmd/aigw/docker-compose.yaml: added image-generation service (curl-based client) modeled after chat/embeddings.
- cmd/aigw/README.md: added image generation usage (service and curl examples) and embeddings “create-embeddings” instructions; updated OTEL section to include image-generation.
- OTLP/OTEL Compose flow updated to demonstrate image generation alongside chat and embeddings.
Tests and config
- Added /v1/images/generations route in tests/extproc/vcr/envoy.yaml.
- test coverage for ExtProc, translator, metrics, and tracing (details below).
Repo hygiene
- .gitignore: ignore tests/e2e-inference-extension/logs/ to prevent accidental log check-ins.

Bug Fixes / Improvements

Error handling: Standardized Images API error parsing (including non-JSON upstream errors) via ImageGenerationError.
Observability: Full tracing and metrics for image generation requests; span/metric attributes include image count, model, and size.
Compatibility/Performance: Brotli decoding support for modern content encodings; improved debug logging around processor selection and instantiation.

Tests

Unit tests
- ExtProc image generation processor: supported routes, upstream scenarios, request/response handling, translator selection by API schema, retry behavior.
- Translator (OpenAI -> OpenAI): request body transformations (model override, forced mutation), non-JSON error mapping, successful response parsing.
- Metrics: token usage recording, image generation counters/histograms, header label mapping.
- Tracing: image tracer/span behavior for basic and multi-image requests, and with pre-existing trace context.
- OpenInference recorder: attribute construction and hooks for requests, responses, and errors.

Dependencies / Migrations

Added
- github.com/andybalholm/brotli v1.2.0 (Brotli decoding).
- github.com/xyproto/randomstring v1.0.5 (utility for upcoming features/tests).
Existing usage formalized
- github.com/openai/openai-go/v2 leveraged for schema-safe decoding and tracing types.

Notes for Reviewers

Processor & filters: Validate routing and upstream filter logic for /v1/images/generations, including retry and header/body mutation behavior.
Translator: Check non-JSON error mapping and OpenAI SDK v2 schema usage to prevent drift; confirm response decoding reliability.
Tracing: Verify header injection, span names, and attribute coverage (count/model/size). Confirm OpenInference semantics and Noop behavior are preserved.
Metrics: Confirm attribute names, cardinality, and consistency with existing GenAI metrics; review token/image recording correctness.
Utilities: Ensure Brotli decode path coexists safely with gzip.
Docs & examples: Run image-generation and create-embeddings services to validate examples; confirm README clarity.
Repo hygiene: Validate .gitignore path excludes e2e inference extension logs as intended.

missBerg · 2025-10-06T20:34:14Z

@nutanix-Hrushikesh some admin with this PR, the DCO and PR style needs to be fixed for this PR

codefromthecrypt · 2025-10-06T22:07:28Z

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

siddharth1036 · 2025-10-07T04:16:10Z

internal/extproc/server.go

 	}

+	// Debug details about the processor selection.
+	if s.logger != nil {


Is this required?

Ill remove some extra logging

can you remove this entire block

@nutanix-Hrushikesh also

mathetake · 2025-10-07T15:17:59Z

@nutanix-Hrushikesh do you want to make it more complete by following @codefromthecrypt comment?

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

nutanix-Hrushikesh · 2025-10-08T15:43:40Z

to make this complete I would recommend following the pattern in testopenai (add a cassette for the new request handed and record it) then record a new span json in testopeninference (this ensures the data we capture is actually per impl and not accidentally different). Both have README, but let me know if any of it is unclear.

I have recorded cassette and span, Please let me if this is expected
@codefromthecrypt

codefromthecrypt

Getting farther! A couple notes

cmd/aigw/README.md

codefromthecrypt · 2025-10-08T22:16:22Z

internal/apischema/openai/openai.go

 	// ModelTextEmbedding3Small is the cheapest model usable with /embeddings.
 	ModelTextEmbedding3Small = "text-embedding-3-small"
+
+	// ModelDALLE2 is the DALL-E 2 model usable with /v1/images/generations.


Before the rationale for entry here was the cheapest model that can be used to produce test data. I would look up which if either or a different one is cheapest. Only add one and document like the others if it is the cheapest.

Similarly for when you make the cassette you will notice in the existing text-to-speach image-to-text etc requests that they use the cheapest possible request in terms of cost and size. You cam ask AI to help you figure out that. For example I used grok to figure out a cheap audio request.

Used smaller model gpt-image-1-mini,
Also there is intermittent issue in test, cassette file is not saved sometimes where server is closed before file is saved to disk. Added temporary fix to sleep 5 second before closing.

mathetake · 2025-10-10T23:31:36Z

will review next week 🙏 sorry for the delay

internal/extproc/imagegeneration_processor.go

mathetake

thanks @nutanix-Hrushikesh, i left a few minor comments. From now, could you avoid force pushing per CONTRIBUTING.md for easy review. Also make sure that you remove any debugging lines

.gitignore

mathetake · 2025-10-13T19:58:19Z

.env.ollama

 THINKING_MODEL=qwen3:1.7b
 COMPLETION_MODEL=qwen2.5:0.5b
 EMBEDDINGS_MODEL=all-minilm:33m
+IMAGE_GENERATION_MODEL=dall-e-2


how is this Ollama?

I know this wont work, but there is no image gen model available with ollma

tests/internal/testopenai/cassettes_test.go

internal/tracing/tracing.go

internal/extproc/imagegeneration_processor.go

internal/extproc/translator/imagegeneration_openai_openai.go

mathetake · 2025-10-15T21:14:19Z

internal/extproc/translator/imagegeneration_openai_openai.go

+
+	// Extract image generation metadata for metrics (model may be absent in SDK response)
+	imageMetadata.ImageCount = len(resp.Data)
+	imageMetadata.Model = ""


like other endpoints, we should assume that the requested model == response model if the response lacks the model name. so could you apply the patch like this?

diff --git a/internal/extproc/translator/imagegeneration_openai_openai.go b/internal/extproc/translator/imagegeneration_openai_openai.go index 9e5d74ba..d5ab5ff5 100644 --- a/internal/extproc/translator/imagegeneration_openai_openai.go +++ b/internal/extproc/translator/imagegeneration_openai_openai.go @@ -6,6 +6,7 @@ package translator import ( + "cmp" "encoding/json" "fmt" "io" @@ -32,11 +33,12 @@ type openAIToOpenAIImageGenerationTranslator struct { // The path of the images generations endpoint to be used for the request. It is prefixed with the OpenAI path prefix. path string // span is the tracing span for this request, inherited from the router filter. - span tracing.ImageGenerationSpan + span tracing.ImageGenerationSpan + requestModel internalapi.RequestModel } // RequestBody implements [ImageGenerationTranslator.RequestBody]. -func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _ *openaisdk.ImageGenerateParams, forceBodyMutation bool) ( +func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, p *openaisdk.ImageGenerateParams, forceBodyMutation bool) ( headerMutation *extprocv3.HeaderMutation, bodyMutation *extprocv3.BodyMutation, err error, ) { var newBody []byte @@ -47,6 +49,7 @@ func (o *openAIToOpenAIImageGenerationTranslator) RequestBody(original []byte, _ return nil, nil, fmt.Errorf("failed to set model name: %w", err) } } + o.requestModel = cmp.Or(o.modelNameOverride, p.Model) // Always set the path header to the images generations endpoint so that the request is routed correctly. headerMutation = &extprocv3.HeaderMutation{ @@ -144,9 +147,9 @@ func (o *openAIToOpenAIImageGenerationTranslator) ResponseBody(_ map[string]stri tokenUsage.TotalTokens = uint32(resp.Usage.TotalTokens) //nolint:gosec } - // Extract image generation metadata for metrics (model may be absent in SDK response) + // Extract image generation metadata for metrics. imageMetadata.ImageCount = len(resp.Data) - imageMetadata.Model = "" + imageMetadata.Model = o.requestModel // Model is not present in the response, so we assume the request model == response model. imageMetadata.Size = string(resp.Size) return

@nutanix-Hrushikesh this is unresolved yet

mathetake

ok overall almost LGTM. can you edit the documentation https://github.com/envoyproxy/ai-gateway/blob/main/site/docs/capabilities/llm-integrations/supported-endpoints.md ?

tests/internal/testopeninference/spans_test.go

codecov-commenter · 2025-10-15T21:40:48Z

Codecov Report

❌ Patch coverage is 89.64646% with 41 lines in your changes missing coverage. Please review.
✅ Project coverage is 78.60%. Comparing base (80838bc) to head (2d25398).
⚠️ Report is 30 commits behind head on main.

Files with missing lines	Patch %	Lines
internal/extproc/imagegeneration_processor.go	87.72%	15 Missing and 12 partials ⚠️
internal/extproc/util.go	0.00%	6 Missing ⚠️
...xtproc/translator/imagegeneration_openai_openai.go	93.75%	2 Missing and 2 partials ⚠️
internal/metrics/image_generation_metrics.go	92.85%	2 Missing ⚠️
internal/tracing/tracing.go	75.00%	2 Missing ⚠️

Additional details and impacted files

@@            Coverage Diff             @@
##             main    #1280      +/-   ##
==========================================
+ Coverage   78.27%   78.60%   +0.32%     
==========================================
  Files         132      139       +7     
  Lines       13349    13745     +396     
==========================================
+ Hits        10449    10804     +355     
- Misses       2260     2287      +27     
- Partials      640      654      +14

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

mathetake · 2025-10-16T15:44:47Z

@nutanix-Hrushikesh i see some bad merge here, could you clean up the commit here?

mathetake · 2025-10-22T15:15:01Z

cmd/aigw/docker-compose-otel.yaml

-  # completion is the standard OpenAI client (`openai` in pip), instrumented
-  # with the following OpenTelemetry instrumentation libraries:
-  # - openinference-instrumentation-openai (completions spans)
-  # - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers)
-  completion:
-    build:
-      context: ../../tests/internal/testopeninference
-      dockerfile: Dockerfile.openai_client
-      target: completion
-    container_name: completion
-    profiles: ["test"]
-    env_file:
-      - ../../.env.ollama
-      - .env.otel.${COMPOSE_PROFILES:-console}
-    environment:
-      - OPENAI_BASE_URL=http://aigw:1975/v1
-      - OPENAI_API_KEY=unused


do not delete irrelevant thing.

can you not lie ?

ai-gateway/cmd/aigw/docker-compose-otel.yaml

Lines 126 to 142 in 8ee7ae1

# completion is the standard OpenAI client (`openai` in pip), instrumented

# with the following OpenTelemetry instrumentation libraries:

# - openinference-instrumentation-openai (completions spans)

# - opentelemetry-instrumentation-httpx (HTTP client spans and trace headers)

completion:

build:

context: ../../tests/internal/testopeninference

dockerfile: Dockerfile.openai_client

target: completion

container_name: completion

profiles: ["test"]

env_file:

- ../../.env.ollama

- .env.otel.${COMPOSE_PROFILES:-console}

environment:

- OPENAI_BASE_URL=http://aigw:1975/v1

- OPENAI_API_KEY=unused

my bad, i think i was referring old file

Signed-off-by: Hrushikesh Patil <[email protected]>

mathetake · 2025-10-22T17:37:26Z

cmd/aigw/config_test.go

 			// Clear any existing env vars
 			t.Setenv("OPENAI_API_KEY", "")
 			t.Setenv("OPENAI_BASE_URL", "")
+			t.Setenv("AZURE_OPENAI_API_KEY", "")


how is this relevant to this PR?

tests were failing locally, to fix that i added this, but ill remove.

mathetake · 2025-10-22T17:37:41Z

cmd/aigw/main_test.go

 		{
 			name:         "run no arg",
 			args:         []string{"run"},
+			env:          map[string]string{"OPENAI_API_KEY": "", "AZURE_OPENAI_API_KEY": ""},


Signed-off-by: Hrushikesh Patil <[email protected]>

mathetake · 2025-10-22T18:05:33Z

tests/extproc/mcp/mcp_test.go

-	ctx, cancel := context.WithCancel(t.Context()) //nolint: govet
+	ctx, cancel := context.WithCancel(t.Context())
+	defer cancel()


Signed-off-by: Hrushikesh Patil <[email protected]>

mathetake · 2025-10-22T19:55:40Z

internal/extproc/translator/imagegeneration_error.go

+
+// ImageGenerationError represents an error response from the OpenAI Images API.
+// This schema matches OpenAI's documented error wire format.
+type ImageGenerationError struct {


why you can't use openai.Error like other places?

mathetake · 2025-10-22T19:57:10Z

internal/metrics/genai.go

can you revert the unrelated changes like grouping etc as well as bring back the reference url to otel?

The reason is that the "grouping comment" will be considered as a comment for the first entry and not for others. That I feels is confusing so i would rather not do that. Instead, if we really want, we should do the documentation comments on each of the constant instead of partial grouping as in the current state.

mathetake

almost there!

Use a different AIGW because Ollama does not currently support the image generation model. This setup requires the OpenAI API key to be set in environment variables. This is a temporary workaround until Ollama adds image generation support. Signed-off-by: Hrushikesh Patil <[email protected]>

cmd/aigw/docker-compose-otel.yaml

mathetake · 2025-10-22T21:16:09Z

internal/metrics/image_generation_metrics.go

+			imageInfo: mustRegisterHistogram(
+				meter,
+				"ai_gateway.image.generation",
+				metric.WithDescription("Image generation request marker with image-specific attributes"),
+			),


why do you need this? can't you just use gen_ai.client.operation.duration or whatever existing well-defined otel metrics and adding additional attributes for images specific stuff? I don't think this additional custom metrics (not even documented in this PR) is necessary. can you remove it?

I even think having them all in the attributes is really bad idea as "size" has the infinite cardinality "image count" as well.

// for metrics and observability. type ImageGenerationMetadata struct { // ImageCount is the number of images generated in the response. ImageCount int // Model is the AI model used for image generation. Model string // Size is the size/dimensions of the generated images. Size string }

so can you

Remove this metrics.

Remove ImageGenerationMetadata

for size, there are only 7-8 sizes option.
image count have infinite cardinality, Should i keep size?

yes fine, but please remove this metrics anyways

…adme Signed-off-by: Hrushikesh Patil <[email protected]>

…penai error type Signed-off-by: Hrushikesh Patil <[email protected]>

Signed-off-by: Hrushikesh Patil <[email protected]>

Signed-off-by: Takeshi Yoneda <[email protected]>

mathetake · 2025-10-23T16:54:25Z

@nutanix-Hrushikesh thank you for the multiple iterations. It's good to see finally this landing!

nutanix-Hrushikesh · 2025-10-23T17:23:40Z

t's good to see finally this landing!
Thanks for all the feedback and guidance along the way!

**Description** This PR adds complete support for OpenAI’s image generation endpoint (/v1/images/generations) across the Envoy AI Gateway. It introduces a processor, translation layer, tracing and metrics instrumentation, Brotli decoding, example client/service updates, and repo hygiene improvements. --------- Signed-off-by: Hrushikesh Patil <[email protected]>

nutanix-Hrushikesh requested a review from a team as a code owner October 6, 2025 16:22

nutanix-Hrushikesh force-pushed the image-generation branch from 9405f3d to 46012bb Compare October 6, 2025 16:59

siddharth1036 reviewed Oct 7, 2025

View reviewed changes

nutanix-Hrushikesh changed the title ~~End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics~~ aigw: End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025

nutanix-Hrushikesh changed the title ~~aigw: End-to-end OpenAI Image Generation support: processor, translator, tracing, metrics~~ aigw: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025

mathetake changed the title ~~aigw: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics~~ feat: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics Oct 7, 2025

mathetake changed the title ~~feat: end-to-end OpenAI Image Generation support: processor, translator, tracing, metrics~~ feat: end-to-end OpenAI Image Generation support Oct 7, 2025

nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from 390e4d0 to 7907692 Compare October 8, 2025 15:42

nutanix-Hrushikesh force-pushed the image-generation branch from 7907692 to f51d37f Compare October 8, 2025 16:18

codefromthecrypt reviewed Oct 8, 2025

View reviewed changes

nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from dc13e02 to dee84d4 Compare October 10, 2025 12:22

AyushSawant18588 reviewed Oct 12, 2025

View reviewed changes

internal/extproc/imagegeneration_processor.go Outdated Show resolved Hide resolved

AyushSawant18588 reviewed Oct 12, 2025

View reviewed changes

internal/extproc/imagegeneration_processor.go Outdated Show resolved Hide resolved

AyushSawant18588 reviewed Oct 12, 2025

View reviewed changes

internal/extproc/imagegeneration_processor.go Outdated Show resolved Hide resolved

nutanix-Hrushikesh force-pushed the image-generation branch from dee84d4 to 16df789 Compare October 13, 2025 05:18

mathetake reviewed Oct 13, 2025

View reviewed changes

mathetake reviewed Oct 15, 2025

View reviewed changes

tests/internal/testopeninference/spans_test.go Show resolved Hide resolved

nutanix-Hrushikesh force-pushed the image-generation branch 2 times, most recently from 82361c0 to fa27f96 Compare October 16, 2025 14:33

Merge branch 'main' into image-generation

39bd095